current study
It's One of the Hardest Confrontations Anyone Can Have. It Might Be One Good Use of a Controversial Technology.
Technology "Why Did You Do It?" A radical new use of deepfake technology is allowing survivors of abuse to confront their perpetrators. Marina vd Roest hadn't faced the man who abused her in decades when she first sat down in front of the laptop. Confronted with his realistic, blinking, speaking face, she felt "scared like a little child again." "Sometimes I had to close the laptop and get my breath back before opening it and continuing with the conversation," she says. Vd Roest is one of the first people to have tried out a radical new form of therapy that involves putting survivors face-to-face with A.I.-generated deepfakes of their attackers as a means of healing unresolved trauma.
Predicting Road Crossing Behaviour using Pose Detection and Sequence Modelling
Dasgupta, Subhasis, Saha, Preetam, Roy, Agniva, Sen, Jaydip
The world is rapidly advancing toward a future where artificial intelligence (AI) takes a central role in many everyday activities. In business, for example, robots have become indispensable in manufacturing processes and warehouse management. These robots efficiently handle tasks such as stacking and removing items, o ptimizing various business operations. In aviation, autopilot systems have been a standard feature in airplanes for many years, enhancing flight safety and efficiency. Similarly, in many developed countries, vehicles equipped with autopilot capabilities ar e becoming increasingly common. These self - driving vehicles are designed with an array of sensors and high - resolution cameras to monitor their surroundings, detect objects, and take necessary actions to prevent collisions or accidents. While these autonomous vehicles perform admirably on highways where the primary concern is other vehicles, they face significant challenges in busy urban environments. In such settings, it is often advisable for drivers to switch from autopilot to manual c ontrol. This is particularly crucial in bustling market areas where pedestrian behaviour can be unpredictable.
Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
Gogoi, Parismita, Kalita, Sishir, Lalhminghlui, Wendy, Terhiija, Viyazonuo, Tzudir, Moakala, Sarmah, Priyankoo, Prasanna, S. R. M.
This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in low-resource settings. The source code is available at GitHub 1 .
Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition
Foley, Sean, Nguyen, Hong, Lee, Jihwan, Kadiri, Sudarsana Reddy, Byrd, Dani, Goldstein, Louis, Narayanan, Shrikanth
Although many previous studies have carried out multimodal learning with real-time MRI data that captures the audio-visual kinematics of the vocal tract during speech, these studies have been limited by their reliance on multi-speaker corpora. This prevents such models from learning a detailed relationship between acoustics and articulation due to considerable cross-speaker variability. In this study, we develop unimodal audio and video models as well as multimodal models for phoneme recognition using a long-form single-speaker MRI corpus, with the goal of disentangling and interpreting the contributions of each modality. Audio and multimodal models show similar performance on different phonetic manner classes but diverge on places of articulation. Interpretation of the models' latent space shows similar encoding of the phonetic space across audio and multimodal models, while the models' attention weights highlight differences in acoustic and articulatory timing for certain phonemes.
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV
Gabitashvili, Alexander, Kellmeyer, Philipp
Intensive care unit (ICU) is a crucial hospital department that handles life-threatening cases. Nowadays machine learning (ML) is being leveraged in healthcare ubiquitously. In recent years, management of ICU became one of the most significant parts of the hospital functionality (largely but not only due to the worldwide COVID-19 pandemic). This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset. The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer). Given that LOS prediction is often framed as a classification task, this study categorizes LOS into three groups: less than two days, less than a week, and a week or more. As the first ML-based approach targeting LOS prediction for neurological disorder patients, this study does not aim to outperform existing methods but rather to assess their effectiveness in this specific context. The findings provide insights into the applicability of ML techniques for improving ICU resource management and patient care. According to the results, Random Forest model proved to outperform others on static, achieving an accuracy of 0.68, a precision of 0.68, a recall of 0.68, and F1-score of 0.67. While BERT model outperformed LSTM model on time-series data with an accuracy of 0.80, a precision of 0.80, a recall of 0.80 and F1-score 0.80.
An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus
Fatahi, Mohammad, Zadeh, Danial Sadrian, Ghojogh, Benyamin, Moshiri, Behzad, Basir, Otman
Autonomous vehicles represent a revolutionary advancement driven by the integration of artificial intelligence within intelligent transportation systems. However, they remain vulnerable due to the absence of robust security mechanisms in the Controller Area Network (CAN) bus. In order to mitigate the security issue, many machine learning models and strategies have been proposed, which primarily focus on a subset of dominant patterns of anomalies and lack rigorous evaluation in terms of reliability and robustness. Therefore, to address the limitations of previous works and mitigate the security vulnerability in CAN bus, the current study develops a model based on the intrinsic nature of the problem to cover all dominant patterns of anomalies. To achieve this, a cascade feature-level fusion strategy optimized by a two-parameter genetic algorithm is proposed to combine temporal and spatial information. Subsequently, the model is evaluated using a paired t-test to ensure reliability and robustness. Finally, a comprehensive comparative analysis conducted on two widely used datasets advocates that the proposed model outperforms other models and achieves superior accuracy and F1-score, demonstrating the best performance among all models presented to date.
Understanding the Impact of News Articles on the Movement of Market Index: A Case on Nifty 50
Dasgupta, Subhasis, Satpati, Pratik, Choudhary, Ishika, Sen, Jaydip
In the recent past, there were several works on the prediction of stock price using different methods. Sentiment analysis of news and tweets and relating them to the movement of stock prices have already been explored. But, when we talk about the news, there can be several topics such as politics, markets, sports etc. It was observed that most of the prior analyses dealt with news or comments associated with particular stock prices only or the researchers dealt with overall sentiment scores only. However, it is quite possible that different topics having different levels of impact on the movement of the stock price or an index. The current study focused on bridging this gap by analysing the movement of Nifty 50 index with respect to the sentiments associated with news items related to various different topic such as sports, politics, markets etc. The study established that sentiment scores of news items of different other topics also have a significant impact on the movement of the index.
Can a Machine Distinguish High and Low Amount of Social Creak in Speech?
Laukkanen, Anne-Maria, Kadiri, Sudarsana Reddy, Narayanan, Shrikanth, Alku, Paavo
Objectives: ncreased prevalence of social creak particularly among female speakers has been reported in several studies. The study of social creak has been previously conducted by combining perceptual evaluation of speech with conventional acoustical parameters such as the harmonic-to-noise ratio and cepstral peak prominence. In the current study, machine learning (ML) was used to automatically distinguish speech of low amount of social creak from speech of high amount of social creak. Methods: The amount of creak in continuous speech samples produced in Finnish by 90 female speakers was first perceptually assessed by two voice specialists. Based on their assessments, the speech samples were divided into two categories (low $vs$. high amount of creak). Using the speech signals and their creak labels, seven different ML models were trained. Three spectral representations were used as feature for each model. Results: The results show that the best performance (accuracy of 71.1\%) was obtained by the following two systems: an Adaboost classifier using the mel-spectrogram feature and a decision tree classifier using the mel-frequency cepstral coefficient feature. Conclusions: The study of social creak is becoming increasingly popular in sociolinguistic and vocological research. The conventional human perceptual assessment of the amount of creak is laborious and therefore ML technology could be used to assist researchers studying social creak. The classification systems reported in this study could be considered as baselines in future ML-based studies on social creak.
Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
Mavi, John, Summers, Nathan, Coronado, Sergio
The current paper presents the development and validation of SelfScore, a novel benchmark designed to assess the performance of automated Large Language Model (LLM) agents on help desk and professional consultation tasks. Given the increasing integration of AI in industries, particularly within customer service, SelfScore fills a crucial gap by enabling the comparison of automated agents and human workers. The benchmark evaluates agents on problem complexity and response helpfulness, ensuring transparency and simplicity in its scoring system. The study also develops automated LLM agents to assess SelfScore and explores the benefits of Retrieval-Augmented Generation (RAG) for domain-specific tasks, demonstrating that automated LLM agents incorporating RAG outperform those without. All automated LLM agents were observed to perform better than the human control group. Given these results, the study raises concerns about the potential displacement of human workers, especially in areas where AI technologies excel. Ultimately, SelfScore provides a foundational tool for understanding the impact of AI in help desk environments while advocating for ethical considerations in the ongoing transition towards automation.
Analyzing Consumer Reviews for Understanding Drivers of Hotels Ratings: An Indian Perspective
Dasgupta, Subhasis, Roy, Soumya, Sen, Jaydip
In the internet era, almost every business entity is trying to have its digital footprint in digital media and other social media platforms. For these entities, word of mouse is also very important. Particularly, this is quite crucial for the hospitality sector dealing with hotels, restaurants etc. Consumers do read other consumers reviews before making final decisions. This is where it becomes very important to understand which aspects are affecting most in the minds of the consumers while giving their ratings. The current study focuses on the consumer reviews of Indian hotels to extract aspects important for final ratings. The study involves gathering data using web scraping methods, analyzing the texts using Latent Dirichlet Allocation for topic extraction and sentiment analysis for aspect-specific sentiment mapping. Finally, it incorporates Random Forest to understand the importance of the aspects in predicting the final rating of a user.